JOURNAL 
OF 
SOUTH AFRICAN BOTANY 
VOL. 33 


Published: 2ND OCTOBER 1967 


NOTES ON NUMERICAL ANALYSIS 
AS APPLIED TO VEGETATION 
CLASSIFICATION 


B. R. ROBERTS 
(Department of Pasture Science, University of the Orange Free State) 


Like other biological scientists, for a variety of reasons ecologists require 
some form of classification of their material. Basically any classification aims 
to order individuals into groups on the basis of their relationships and the 
theoretical study of classification has led to the discipline of taxonomy, which is 
concerned with the theory and principles of such grouping. In South Africa, as 
in many other countries, vegetation studies have included a number of basically 
very different systems of community classification and in recent years the appli- 
cation of so-called objective methods has received an increasing amount of 
attention. This paper aims to emphasize a few fundamental concepts in vegeta- 
tion classification which may be considered by those who, while they may have 
little aptitude for mathematical theory, must decide on the most appropriate 
approach to classification in the field. 

A detailed comparison of the approaches used in classification and ordination 
of vegetation has been made elsewhere (Roberts, 1966) and in the present dis- 
cussion it is assumed that statistical methods are required if the desired level 
of efficiency is to be attained in the investigations concerned. 

As the importance attached to objective, quantitative and statistical methods 
in ecology increases, so the dangers of the application of inappropriate mathe- 
matical procedures become more apparent. At the same time the scientific 
respectability which statistical treatment of the data presumably lends to 
quantitative investigations in no way guarantees the validity of the conclusions 


241 


242 The Journal of South African Botany 


reached. Indeed, the mere application of objective methods, which do no more 
than eliminate personal bias, will only give meaningful results if the statistical 
procedures used are suitable for the purpose at hand. Clearly what is required 
is: 

(a) an unbiased sampling technique; 

(b) the use of meaningful classification criteria ; 

(c) the computation of end groups on the most appropriate basis. 

Failure to meet any one of these basic requirements may invalidate the entire 
investigation and it is the appropriateness of the basis of grouping that is of 
primary interest in the present discussion. 

If ecological investigation is to expose rather than impose groups or taxa, 
then new units must be recognized on the basis of either similarities or differ- 
ences between individuals. Recognition and grouping of similarities gives what 
is referred to as agglomerative classification while grouping according to 
differences leads to a divisive classification. In addition, grouping may be 
based on either a single or on several attributes (e.g. species). Mathematical 
efficiency will largely determine which combination of the above alternatives is 
indicated, so that these primary decisions regarding choice of technique are 
basic to the ecological validity of the results obtained. 


An example of divisive monothetic (single attribute) grouping is that of 
Williams & Lambert (1959) known as association analysis. It is claimed that 
this method gives what is probably the most efficient division of the vegetation 
because it is based on the most important floral discontinuities. In other words, 
it divides the plant population on the basis of those species which display the 
highest degree of association with all other species. The original method has 
been modified and extended to give a number of progressive stages in association 
analysis, e.g. nodal and inverse analysis (Williams & Lambert, 1961a). However, 
even in the more recent refinements of the method (Lance & Williams, 1965) 
the approach remains basically divisive and monothetic. Despite the disad- 
vantages inherent in all monothetic methods, Greig-Smith (1966) regards 
Williams & Lamberts’ association analysis as the most satisfactory of the readily 
available classificatory techniques. Association analysis has been used by several 
workers in South Africa recently (van der Walt, 1962; Grunow, 1964; Miller, 
1966) and the writer has discussed the practical application of the method 
when used as a basis for vegetation/habitat studies in the Orange Free State 
(Roberts, 1966). 

In view of the increasing attention being paid to association analysis as a 
basic classificatory procedure, the limitations of this type of analysis as referred 
to by Ivimey-Cook and Proctor (1966) are worthy of note. These authors 
emphasize that the criterion of heterogeneity (highest X? value) is not necessarily 


— 


Notes on Numerical Analysis as applied to Vegetable Classific sn, 


indicative of population heterogeneity. Nor does division of the data necessaril, 
indicate discontinuity, since the existence of any significant association withitn 
continuously distributed data will cause a division. The procedure sets a limit 
below which the remaining group of quadrats cannot be subdivided however 
heterogeneous they are. Dividing species are limited to those which actually 
occur in the data, although these may not necessarily be the best divisions. The 
hierarchical arrangement produced is not necessarily a hierarchy of relationship 
(Ivimey-Cook and Proctor, 1966). Recently the results obtained by association 
analysis have been compared with those obtained by other methods of classi- 
fication as applied to actual field situations and close similarities between the 
groupings as given by association analysis and other methods of clustering 
have been shown (Anderson, 1966). Sokal and Sneath (1963) report a comparison 
of association analysis and a polythetic method using Sneath’s coefficient of 
similarity (which forms clusters on the basis of single linkage of similarity 
values) when applied to data provided by Lambert from 56 quadrats. It was 
found that when nodal analysis (an extension of association analysis) was applied 
the dendrograms given by the two methods were very similar in differentiating 
most vegetation types. The same applied to the comparison with normal and 
inverse analyses (Williams & Lambert, 1961a). However, Sokal & Sneath point 
out that the polythetic method was “‘less sensitive to chance presence or absence 
in a quadrat of any single species”. Thus in association analysis, in which 
clustering depends on the presence of a single species, a quadrat may occasionally 
be removed by classification, from quadrats which are very similar floristically. 

Theoretically any monothetic system of classification will carry the risk of 
misclassifying units when natural phenetic groups are sought. Sokal and Sneath 
(1963) explain that this is because an organism (e.g. community) which lacks 
the all-important attribute (e.g. species) which is used to make the primary 
division, will always be assigned to a group far from the “required position”, 
even though it closely resembles its natural neighbours in all other attributes. 
“The disadvantage of monothetic groups is that they do not yield natural taxa, 
except by a lucky choice of the feature used for division. The advantage of 
monothetic groups is that keys and hierarchies are readily made” (Sokal & 
Sneath, 1963). 


In comparison, polythetic methods may be expected to yield more natural 
groupings because they cluster those units which have the greatest number of 
attributes in common. Nor is any one attribute sufficient or essential for member- 
ship of the group in a polythetic system. The polythetic divisive methods are at 
present computationally impractical (Greig-Smith, 1966) so agglomerative 
methods are indicated. Agglomerative methods all have the disadvantage that 
since they are built up from the lowest level. they often include much unwanted 


244 The Journal of South African Botany 


data and are computationally slow. Natural taxa have been described as “‘sets 
composed of all those elements which share x or more features out of y features, 
where x and y are large numbers, but in which an element may have any com- 
bination of features as long as the total number of features shared with every 
other element of the set is x or more” (Sneath, 1961). To arrive at a natural 
classification the ecologist has to answer certain basic questions which have 
ecological rather than the statistical significance: 

1. What material is to be classified ? 

2. What attributes should be used as classification criteria ? 

3. Should these criteria be weighted ? 

4. Should grouping be based on similarities or differences ? 

5. What method of clustering should be adopted ? 

While the details of procedure may depend largely on the aims of the 
investigation, a few alternative methods of classifying vegetation when presence 
and absence data are used to establish similarities may be considered. All 
forms of cluster analysis refer to numerical techniques used for defining groups 
of taxonomic units based on high similarity co-efficients and together with 
association analysis, the following have been suggested as efficient methods: 


1. Single Linkage Clustering (Sneath, 1957). 

This first clusters those individuals which are most closely related by the 
highest possible co-efficient of similarity. The level of similarity is then succes- 
sively lowered, allowing more individuals to enter the original groups. Thus 
the taxonomic units A and B will combine when the co-efficient of similarity 
is say 0:9; when it is 0-8 taxonomic unit C joins A and B and when it reaches 
0-7 unit D enters. The admission of one unit or cluster into another is by the 
criterion referred to as single linkage. However, chaining may occur when 
members at each end of a cluster have a relatively large taxonomic distance 
between them, although each is very close to its nearest neighbour. To overcome 
chaining, Sneath suggests re-calculating the mean similarity values both within 
and between clusters at any of a number of hierarchic levels. 


2. Clustering by complete linkage. 

This is Sorensen’s (1948) method and corresponds to Sneath’s method, 
except that the joining of a taxonomic unit to an existing cluster is on the basis 
of what is termed complete linkage. This implies that the unit concerned must 
display similarities to every member of the cluster and not just with one member 
as in single linkage. The type of clustering arrived at will obviously differ with 
different initial levels of similarity co-efficient. 

3. Clustering by average linkage. 

This approach is suggested by Sokal and Michener (1958) primarily for use 

with correlation co-efficient matrices but can be applied to various types of 


Notes on Numerical Analysis as applied to Vegetation Classification 245 


similarity co-efficient matrices. Average linkage procedures base the admission 
of an individual into an existing cluster, on the average of the similarities of 
that individual with the members of the cluster. 


In summarizing the comparison between clustering techniques Sokal and 
Sneath (1963) recommend average linkage at the present stage of development 
of numerical taxonomy. This procedure re-calculates the similarity co-efficient 
matrix at regular intervals as the classification develops. Rohlf (1966) supports 
this suggestion, adding that “. .. our recent results show rather clearly that the 
average linkage method (using arithmetic averages) is to be preferred on the 
basis of the fact that it results in classifications which show less distortion than 
when classifications are constructed by other means (such as single linkage or 
complete linkage cluster analysis)”. Programmes for average linkage analysis 
are available at the Microbial Systematics Research Unit, University of Leicester, 
Leicester, England (Sneath, 1966) and at the Department of Entomology, 
University of Kansas, Lawrence, U.S.A. (Rohlf, 1966). 


The Department of Statistics at Rothamsted Experimental Station has 
recently examined some of the present methods proposed for clustering and has 
compiled programmes for the classification of presence and absence data or 
quantitative data on plants and habitat (Orion Classification Programme 
CLASP). A second programme, which is still somewhat exploratory, has been 
developed primarily to compare clustering methods (Numerical Taxonomy 
Programme, NUT). Gower (1966) suggests that while association analysis may 
be useful for key-making, for general purpose classification the “Weighted 
Mean-Pair Group” method (Sokal & Michener, 1958) is recommended, using 
the authors’ formula (3). Gower stresses however, that while the hierarchical 
structure imposed by cluster analysis is probably valid for differentiating between 
higher and lower taxonomic orders, when cross-classifications exist, some form 
of multivariate analysis is indicated. 


In a recent contribution by Williams, Lambert and Lance (1966) several of 
the fundamental issues involved in the choice of classificatory procedure are 
admirably dealt with from the theoretical viewpoint. However, these authors 
point out the difficulty inherent in assessing the ecological acceptability of 
classificatory procedures. The difficulty is to select suitable “objective criteria 
in an essentially subjective situation by which to differentiate between the 
analyses”. They add that within the test-communities, “the threshold for our 
acceptance of any of the hierarchical methods under study (including centroid 
sorting and information statistic) will be that the major groupings which arise 
shall not be fewer than, or markedly different from, those recognised intuitively 
as distinct ecological entities at the time the data were collected”. 

It would seem thus that some degree of subjective ecological integrity is 


246 The Journal of South African Botany 


required in evaluating the various objective approaches to vegetation classifica- 
tion and thus in survey work and mapping. emphasizing the fundamental 
requirement of a broad ecological training for practising botanical survey staff. 
While this paper does not propose the use of one classificatory procedure to the 
exclusion of others it may throw some light on the real problems confronting the 
field worker, who is too often groping instead of grouping. 


REFERENCES 


ANDERSON, A. J. B. 1966. Comparison of Clustering Methods in Numerical Taxonomy. 
M.Sc. Thesis, University of Aberdeen, Scotland. 

Gower, J. A. 1966. Personal Communication. Rothamsted Exp. Sta. Herts., England. 

GREIG-SMITH, P. 1966. Personal Communication, University College of N. Wales, Bangor. 

Grunow, J. O. 1964. Objective classification of plant communities. S. Afr. J. Agric. Sc. 7: 
lee 

Ivimey-Cook, R. B. AND Proctor, M. C. F. 1966. The application of association-analysis 
to phytosociology. J. Ecol. 54: 179. 

Lance, G. N. AND WILLIAMS, W. T. 1965. Computor program for monothetic classification. 
Computor J. 8: 246. 

MILLER, P. 1966. Unpublished M.Sc. (Agric). Thesis, Univ. Natal. 

Roserts, B. R. 1966. The Ecology of Thaba ‘Nchu—a statistical study of vegetation/habitat 
relations. Ph.D. Thesis. Univ. Natal. 

RonLr, F. J. 1966. Personal Communication. University of Kansas, Lawrence, U.S.A. 

SoKAL, R. R. AND MICHENER, C. D. 1958. A statistical method for evaluating systematic 
relationships. Univ. Kansas Sci. Bull. 38: 1409. 

SoKAL, R. R. AND SNEATH, P. H. A. 1963. Principles of Numerical Taxonomy. W. H. 
Freeman, London. 

SORENSEN, T. 1948. A method of establishing groups of equal amplitude in plant soci- 
ology. Biol. Skr.. 5 (4):1. 

SNEATH, P. H. A. 1957. The application of computors to taxonomy. J. Gen. Microbiol. 
17: 201. 

SNEATH, P. H. A. 1961. Recent developments in theoretical and quantitative taxonomy. 
System. Zool. 10: 118. 

SNEATH, P. H. A. 1966. Personal Communication. Univ. Leicester, Leics., England. 

VAN DER WALT, J. L. 1962. Ondersoek na ’n objektiewe metode van plantopname in die 
Sneeubergreeks. M.Sc. (Agric.) Thesis, University of Pretoria. 

WILLIAMS, W. T. AND LAMBERT, J. M. 1959. Multivariate methods in plant ecology. I. 
Association-analysis in plant communities. J. Ecol. 47: 83. 

WILLIAMS, W. T. AND LAMBERT, J. M. 196la. Multivariate methods in plant ecology. 
HI. Inverse association-analysis. J. Ecol. 49: 717. 

WILLIAMS, W. T. AND LAMBERT, J. M. 1961b. Nodal analysis of associated populations. 
Nature 191: 202. 

WILLIAMS, W. T., LAMBERT, J. M. AND Lance, G. N. 1966. Multivariate methods in plant 
ecology. V. Similarity analysis and information statistic. J. Ecol. 54: 427. 


